Overview of the DSL Shared Task 2015

نویسندگان

  • Marcos Zampieri
  • Liling Tan
  • Nikola Ljubešić
  • Jörg Tiedemann
  • Preslav Nakov
چکیده

We present the results of the 2nd edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the LT4VarDial’2015 workshop and focused on the identification of very similar languages and language varieties. Unlike in the 2014 edition, in 2015 we had an Others category with languages that were not seen on training. Moreover, we had two test datasets: one using the original texts (test set A), and one with named entities replaced by placeholders (test set B). Ten teams participated in the task, and the best-performing system achieved 95.54% average accuracy on test set A, and 94.01% on test set B.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Identification using Classifier Ensembles

In this paper we describe the language identification system we developed for the Discriminating Similar Languages (DSL) 2015 shared task. We constructed a classifier ensemble composed of several Support Vector Machine (SVM) base classifiers, each trained on a single feature type. Our feature types include character 1–6 grams and word unigrams and bigrams. Using this system we were able to outp...

متن کامل

A Report on the DSL Shared Task 2014

This paper summarizes the methods, results and findings of the Discriminating between Similar Languages (DSL) shared task 2014. The shared task provided data from 13 different languages and varieties divided into 6 groups. Participants were required to train their systems to discriminate between languages on a training and development set containing 20,000 sentences from each language (closed s...

متن کامل

Discriminating between Similar Languages Using PPM

The paper presents the results of participation of Bobicev team in DSL (Discriminating Similar Languages) shared task 2015. It describes the use of PPM (Prediction by Partial Matching) for language discrimination. The accuracy of the presented system was equal to 94.14% for the first set and 92.22% for the second set. The results were scored as the 4th for the first task and 5th for the second ...

متن کامل

DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model

We investigate two approaches to automatic discrimination of similar languages: Expectationmaximization algorithm for estimating conditional probability P (word|language) and a series of byte level language models. The accuracy of these methods reached 86.6 % and 88.3 %, respectively, on set A of the DSL Shared task 2016 competition.

متن کامل

Overview of TweetMT: A Shared Task on Machine Translation of Tweets at SEPLN 2015

This article presents an overview of the shared task that took place as part of the TweetMT workshop held at SEPLN 2015. The task consisted in translating collections of tweets from and to several languages. The article outlines the data collection and annotation process, the development and evaluation of the shared task, as well as the results achieved by the participants.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015